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Text to Speech Synthesis system requires an important module which will convert graphemes to phonemes. 
Grapheme to Phoneme Conversion module takes sequence of sentences as input and map to corresponding phonetic signal. 
These phonetic signals also known as utterances are stored as 'wav' files on the machine. There are various methods of 
associating or aligning these wave files with graphemes. This paper discusses important issues relating to Grapheme to 
Phoneme alignment for Sanskrit text written in Devanagari script. Rule based syllable clustering/segmentation was found 
suitable for the task. Rule base was created using Regular Expression Builder and was hand crafted. Further, an approach 
for associating this Linguistic Syllable Unit with manually segmented utterance units was experimented. Linguistic 
Syllable Units were segmented using Rule Base from Sanskrit Texts. Further, These Linguistic Syllable units were 
re-segmented to be aligned with phonetic utterances i.e phonemes, for integrating with Text to Speech Synthesis System 
for Sanskrit. 

KEYWORDS: Phonetic Rules, Linguistic Syllable Unit, Grapheme, Virama, Matra, Halant, Svara, Rule Based Syllable 
Segment 

INTRODUCTION 

Sanskrit is considered to be mother of most of the Indian Languages. Sanskrit has a rich collection Phonetic texts 
viz. Siksa, pratisakhya, Astadhyayi etc., hereafter referred as Sanskrit Phonetics. These texts enumerate Phonetic rules for 
syllabification in the form of aphorisms. Sanskrit language maintains orthography of utterances with written symbols. 
Orthography is the term used to indicate 'what is written is what is uttered'. The Writing System irregularity is main cause 
of complexity in doing Grapheme to Phoneme Conversion [Tatyana2005] . Writing System irregularity implies that there is 
no one to one correspondence between 'what is written is what is uttered'. Even though Sanskrit text is represented in 
majorly in Devanagari Script, it is possible to find the same being represented in regional languages of India. In this paper 
the discussion is limited to Sanskrit Text represented in Devanagari Script. PaninTya Siksa mentions sixty three speech 
Sounds. Among them twenty-one are vowels and remaining are consonants, the details may be found in the Phonetic rules 
Subhra Basu Ghosh (2003). 

Text to Speech Synthesis (hereafter referred as TTS) and Speech recognition technologies are finding fast growth. 
TTS for any language requires a tool that associates every letter symbol in the stream of text to a corresponding phone or 
utterance. This task is carried out by Grapheme to Phoneme Alignment Module. Grapheme to Phoneme Alignment is 
essential for any orthographic language also, for the cases when the number of letter is greater than that of phonemes of a 
language [Tatyana2005]. Rule base extracted from Sanskrit Phonetic was used to build this module. The research was 
carried out in Rashtriya Sanskrit Vidyapeetha, a Central Sanskrit University, where both Sanskrit Linguists and Computer 
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Professionals are available. This University has rich collection Unicode text digitized. Sanskrit being rarely spoken finds 
less number of researchers working for it. 

Definitions of Important Terminology Relating to Sanskrit Phonetics 

The most usual term used in Sanskrit phonology for syllable is akshara, which literally means imperishable. 
In theory a syllable consists of a sequence of sounds containing one peak of prominence. This peak of prominence is 
known as svara or vowel. In practice it is often impossible to define the limits of a syllable, because there is no means of 
fixing any exact points of minimum prominence Vidhata Mishra, (1972). Sanskrit is a vowel-centric language and hence 
consonants must be clustered with a vowel preceding it or following it. This process of clustering consonant or consonant 
cluster with either preceding vowel or next vowel is known as syllabification. This syllabification is governed by set of 
Sanskrit phonetic rules. As for as Sanskrit Phoneticians, are concerned, they prefer to adhere to rules enumerated in 
Phonetic Texts. The grapheme cluster that is obtained by associating consonant conjuncts with either vowel preceding or 
next to it is called Linguistic Syllable Unit (LSU). LSU should be recombined to align with phonemes. 

Different methods were proposed to handle this Grapheme to Phoneme Alignment. In their research work various 
researchers proposed decision based tree Alan W Black, et al (1998), Ananlada (2000), statistically based Stanley (2003), 
and pronunciation analogy Example Based Grapheme phoneme conversion as proposed by Paisarn et al (2006) segmented 
the syllables either by rule based procedure or by statistically based syllable segmenter. Paisarn proposed this for Thai 
Language. As the rules used by the paisam did not coincide with the phonetic rules of Sanskrit, it was felt to develop Rule 
Based syllable segmenter for Sanskrit. In This paper descripes Grapheme to Phoneme Alignment with Rule Based 
Segmentation 

GRAPHEME TO PHONEME ALIGNMENT 
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Figure 1: Block Diagram of Grapheme to Phoneme Conversion 

The Following steps were taken up in sequence to take up grapheme to phoneme alignment task for Sanskrit text. 
Record all phonemes possible 

Manually segment the utterances in groups using Wave lab or SFS, Wave Pad Apps. 

Syllabification of stream of Devanagari texts using proposed algorithm (Rule Based Syllable Segmenting) 

Divide LSU into Phoneme alignable Units basing utterance data available. 
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Integrate Graphme to Phoneme module with Text to Speech Synthesis for Sanskrit. 

Compare manually 

Creating Phoneme Utterance 

A Sanskrit Phonetician, with no defects of speech, as enumerated in Phonetic texts was chosen to create samples 
for this experiment. A good quality recorder, of sony make, PCM-D50, Linear PCM Recorder used to record the signal. 
The Recording was done in Acoustic Studio in University. Low Cut Filter, Limiter Options were enabled. Sampling rate 
was set at 44.1 kHz; bit rate was set at 16 bit stereo. The Speech Talent was asked to utter all possible combinations from 
the basic Sanskrit phone set, giving a gap of silence between each combination. Further, Amara Kosha, a Sanskrit 
Nighantu was depended in selecting specific consonant conjunct contexts, for recording. This wave files were segmented 
using WAVELAB 4.0 and were named accordingly. The files generated were so named that Unicode numbers 
corresponding to that utterance sequence formed the part of wave file name. Character "O" was inserted in between each 

Unicode number to isolate each phoneme. For example utterance symbolised 'ST' is named as "232502359", '•sft' is named 

as 23580235202368. These numbers are decimal coded Unicode numbers corresponding to W>, 31, T, f matra 

respectively. As can be seen from the algorithm ASW () function returns Decimal Coded value for the Unicode string, in 
Microsoft environment. 

Syllabification 

The following are Syllabification Rules extracted from Sanskrit Phonetic Texts: These rules were used in next to 
char function 

• Vowel is the nucleus of the syllable. 

• Consonant or consonant cluster should be associated with succeeding vowel 

• The final consonant in the word will follow preceding vowel. 

• Anusvara and visarga belong to preceding vowel. 

• The first letter of consonant conjunct go with either preceding vowel or succeeding vowel 

• The kramaja letter which means a duplicated letter is a part of previous vowel. 

• According to Taittiriya Pratisakhya, the plosive in the group plosive + spirant belong to succeeding vowel. 

• In a clustered consonant conjunct consonant + semivowel, consonant goes with succeeding syllable. 

• Permitted finals rule says that only one consonant is allowed after last vowel in the word. 

• Virama characters define the boundaries of the words or sentences. 

• Halant is a symbol which is used to create boundary between to phones of phone set, which has no utterance 
mapping. 

• White Space, coma, Full Stop, Question marks take 1,2,3,4 matras duration silence. These are known as virama in 
Sanskrit phonetics. 



www.tjprc.org 



editor@tjprc.org 



96 



Chandra Sekharam Bondu & Ramakrishna S R 



The code for next to char function is as shown on the figure 2 and declaration on figure 3. The next to char 
function takes two parameters. First parameter Pd text is Devanagari character string, current _ character _ pos is from 
where consonant conjunct starts. The function return an integer where the succeeding vowel position in the string exists, to 
which consonant conjunct should be attached to form a syllable. For example, if next to char function is called with 

Pd text ="3T£fR" and different current _ char _ pos the following output is obtained. The above string can be obtained with 
"3T+3rKH-^ + T" .Here are five phones in this Devanagari string 

Pubic R;ir icr. r.2Uti>±ai(p{!.KM . ciaiar.t_char_p05) Ailr.teEar 

" Tr.Li Fu-.ctLar.fatuir.i sr. ir.taEat irjlicatixg r.ui-.bar <sf ph-ar.arnei tat car. ba cltiiar to iuccaafiuisivQ^al 
' — FIRSTCHARFBOMCUBHEHTPOSIS VOWEL 
If RaEax.Iii.Iat:rj(Mtfi(j>itaM. CLJiar.[_char_pcii. 1). vo^al) Tr.ar. 
If Ra EaxI Bj.Iat=rj(Mi{i{pcle^i ctJTar.[_±af _pos - 1 . 1) . ar.tiaf e) Ther. 
r.axttochar = etna r.t_char_pK - 1 

r.3xtio±ar = ctrfar.t_chaf_j>oi 
End If 
Em If 

"-— FIRST CHAR FROM CURRENT POS IS CON SONANT 
If RaEaxIiiIat:±(MLfi(piext. cuTar.r_char_poi. 1). cor.iQLant) Tr.ar. 

r.axttochaf = ruiT3".[_±aj_jmi 
If R-iE5JLliMatrj(lIifi(pflt2it. cujjar.[_c:hsr_pos - 1. 1). virama)Thar. 
r.axtt&±ar = c^ar.t_cr.a^_poi 
E^^If I^Eaxl5iIat:±(Mfc!{;p{l[a3i[ : cuT3:i_±af_po5 - 1 . 1). ar.tiar e) Thar. 

r.aMochar - cujrar.[_cr.af_pai - 1 
Et^If R2E3t.IiM=rtdj(5iIiti(p{H2M. ciaTar.tjrfcafjsos — 1 . 1). ncaira) Thar. 
If RaEaxIsilatchpsItfl^pSiaTti. curar.[_±ar_po;- 2. 1). aiar:) Thar. 

nax-ttoctaf = CLjrar.[_-±ar_poi — 2 
EL^a 

r.axtto±ar = cuiar.[_-±ar_pos - 1 
End If 

ELialf RaEaxl5iIat:rj(MLi!<p{itaM. cisrisr.t_-:laf_po> - 1. 1). '?") Thar, 
r.axttodiar = r.axtfiKhar(psl[ax[ : cujrar.[_-±ar_poi— 1) 
If RaEax.IiiIat-:h(lIta^p{ltax[. ■njrar.[_-±ar_p3i- 2. 1). vbarra) Ther. 

nax-ttocfcaf = cuTar.[_-±af _poi — 1 
ELia 

r.ax[K>±ar = iEX[D-±£r{p&[ax.[. CLrrar.[_±aj_jKii -1) 
End If 
End if 

ELralf RaEaxIiIaEhpilLfi<p{lt2it. cujiar.t_:har_po;. 1). '?') Than 

r.aKttacfcaf = iExtb:tat(p£iaiti, cwfar.[_±3f_tKii — 1) 
ELsalf RaEaxI iJachpiIlS^pitast. CLJiar.t_:har_[>si. 1). vvanca) Thar. 

r.aMtoihaf = cLirar.[_chaf_poi 

End If 

Ratur. nextfcchar 
End Frr.ctLor. 

Figure 2: Algorithm to Find Succeeding Syllable Position to which the 
Consonant Conjunct from Current _ Char _ Pos can be Attached 
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D eel a rati onforthe Al gorithrn expl ai ned, 
vowel = 'T3rai$#^^^^ J C 5 jfT3TT3-r311'' 

consonant = "T^^-SJ ■=■! ^ ^ -jI ^ ^ d. (5.-=, <i □ I d sje^LA <H i_| Lb sj ^ <H -H 4. <rt d 3 1 <H 
matra = «[OTf3-<5^CH^d«otot]" 
vi ram a = "[j\s I ]" 
a n u s a rg = "[■::■■:■:]" 



Figure 3: Declaration for the Next to Char Algorithm 



Current _ Char _ Pos 


Next to Char(Pd Text, 
Current _ Char _ Pos) 


Explanation 


1 


1 


First is independent Vowel Character 


2 


4 


Consonant conjuct starts at pos=2 and is 
attached to vowel /consonant at pos=4 


5 


1 


Consonant starts and ends at pos 5 



Figure 4: Output of Next to Char Function for Various Current _ Char _ Pos and Pd text="3f8HT" 
Linguistic Syllable Unit to Phoneme 

The Linguistic Syllable Unit is aligned by manually constructing the mapping table which tries all possible 
alignments of letter and phones. This is the technique used by Alan et.al (1998). 

• The purposes of aligning this LSU with phoneme utterances are 

• To generate synthetically a pronunciation of unseen syllable from the available utterances. 

• To generate grapheme phoneme alignments 

• To construct statistics of synthetically generated phoneme utterances. 

FtmtifM: GR.4I«EM2I«OS"EME(LSU> AS STRING 
U : =A5W{Mid(LSUi : i )> 
SeL^:t Case 

Case 1 Len(L5IJi=l Return Ul 

CjhJ Lerj(L5U)=l if exist (iilou2 ) reiLjr. tlcoJ else (tl-t2 ) 
Cass 3 if €xts«(iileiu2ais3) tstoz vlmJaiJ 

Els if exi3t(i;lau2) reHirr. (tloii — id) 

El-seif e7Ltst(BloEj) returr/Tj— v2e>v3} 

Else if not exi^tlotJ) arjJr.ot exts< u2au3) ietuaT-( tl— lJ-iJ) 
Case 4 if exi.a(iiloi£.oi»3Qii4) rerun: tlccJotjoi^t 

Else if exist (i;laiilou3) rerun: (rlotlouji— 1^4) 

ELseif exist (u2au3<Ms4) return (ul-vJeujeud) 

Etseif exist(ulauJ) ane exisr{t2c<ujv returr. (uloul — ujiou4) 

ELseifexist(tloij;i STJi r.-3t exis^uilJuji) retur:(iJ.c"u2 — lJ — iAj 

ELseif not exUt(i;loiu2) and exls^iJoi^i rerun: (ul— u2 — u3ou4) 

ELseif r.ot existduloaxZ) ani r.ot axist^uJouj) re turr. (ul— uJ — uj — u4) 
Ere Select 
END FUNCTION 

Figure 5: Algorithm for Finding Closest Matching Phoneme Sequence 

As explained, while automatically naming the utterances recorded a character "o" is inserted in between 
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corresponding decimal Unicode numbers For example file name corresponding to "3T is 2309. Similarly, filename 

corresponding to "?T = "3T + 3F' is 2332o2334. Character 'o' between Unicode indicates it is original utterance. It can also 

be represented as 2332+2334 where + between Unicodes indicates it is not original utterance, but synthesized. Thus, using 
above algorithm one can map to the signal files available on the system with closest matching syllable. The above 
algorithm segments each linguistic Syllable unit into possible sequences, while matching the utterance signals available 
with system. The closest matching syllable can be selected by counting number 'o' characters in the Unicode based file 
name discussed above. If number of "o" s is more than the number of "+"s then the grapheme to phoneme alignment can 

be understood closest matching grapheme to phoneme. "•93T" for example, is Linguistic syllable unit then the above 
function can return following values 





If exist(ulou2ou3) 


2358o2352o2351 


Numberofo's=2+'s=0 




Else if exist(ulou2) 


2358o2352 +2351 


Number of o's=l +'s=l 




Else ifexist(u2ou3) 


2358+235 lo2351 


Number o's=l +'s=l 




Elseif not exist(ulou2) and 
not exist( u2ou3) 


2358+2352 +2351 


Number o's=o +'s=2 



Figure 6: Output of Grapheme Phoneme Function 



RESULTS 

The utterances created included isolated basic phone set with pattern C, V, CM, CA, CMA and VCH. 
Other triphonic utterances are also recorded using the words listed in Sanskrit dictionary. Linguistic Syllable Unit was 
segmented using Closest matching Syllable algorithm explained. Because basic phoneset was available, grapheme to 
phoneme alignment was possible atleast with maximum editing distance, if not with closest matching syllable. 
The Unicode data available with the University is used for testing with Syllabification and LSU to Phoneme Alignment 
module. The output of the above modules were manually got verified with Sanskrit linguists. The results showed cent 
percent accuracy. It was felt that this result could be because, syllabification is done purely on the rule base, whereas other 
researchers depended on the statistical and data oriented methods. 

CONCLUSIONS 

From the experiment results Rule Based Segmentation Technique for syllabification of Unicoded Sanskrit Text 
and Closest Matching Algorithm produced impressive results. These two modules could successfully be integrated with 
TTS for Sanskrit. These algorithms were experimented with semi automatically segmented phonemes. It was thought the 
alignment of Linguistic syllable Unit should be performed on automatically segmented phoneme utterances. This rule 
based algorithm can be successfully tried on any syllable centric orthographic Indian languages. 
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